FIGS. 3 A and 3B are an overview of selected components of the basic LSA 
paradigm, in accordance with one embodiment of the present invention; 

Please replace the paragraph beginning on page 6 at line 1 2 with the following 
paragraph: 

FIGS. 4 A and 4B are an overview of selected components of the adaptive LSA 
paradigm, in accordance with one embodiment of the prese nt inventions; 

Please replace the paragraph beginning on page 6 at line 14 with the following 
paragraph: 

FIGS. 5 A and 5B are an overview of selected components of the matrix 
transformation of the adaptive LSA paradigm, in accordance with one embodiment of the 
present invention; 

Please replace the paragraph beginning on page 6 at line 16 with the following 
paragraph: 

FIGS. 6 A and 6B are an overview of selected components of the vector 
transformation of the adaptive LSA paradigm, in accordance with one embodiment of the 



present invention; 



Please replace the paragraph beginning on page 6 at line 1 8 with the following 
paragraph: 

FIGS. 7 A and 7B are an overview of selected components of prior art baseline 
adaptation; 

Please replace the paragraph beginning on page 1 1 at line 1 8 with the following 
paragraph: 
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FIGS. 3A and 3B illustrate selected components of the basic LSA paradigm 300 

used to construct the continuous vector space S, referenced in FIG. 3B as LSA space S 

316. The LSA paradigm 300 first captures the semantic patterns of the word-document 

co-occurrences that appeared in the training corpus 7202 by constructing a word- 
document matrix W 302 of dimension M x N , whose entries w fj 304 suitably reflect the 

extent to which word w i 208 appeared in document dj 204, and then performing a 

singular value decomposition (SVD) of the word-document matrix W 302 having an 
order of decomposition of R « min(M, JV)as in [1]: 

W = USV T , (1) 
where U 306 is the M x R left singular matrix of row vectors, w, (l < / < M), S 308 is 

the R x R diagonal matrix of singular values s { >s 2 >...s R > 0 , and V T is the 
transposition of V 3 10, the RxN right singular matrix of row vectors v . (l < J < N). 
The value of R can vary depending on the values of M and N , and by balancing 
computational speed (associated with lower values of R ) against accuracy (associated 
with higher values of R ). Typical values for R range from 5 to 100. 

Please replace the paragraph beginning on page 1 3 at line 9 with the following 
paragraph: — _ 

FIGS. 4 A and 4B illustrate selected components of the adaptive LSA paradigm 
400 using latent semantic adaptation in accordance with an embodiment of the present 
invention. The adaptive LSA paradigm 400 extends the basic LSA paradigm 300 so that 
some or all of the data in new documents 1 1 0 are taken into account through incremental 
adaptation of the original LSA space S 3 16 in a way that is computationally efficient. 
Adaptation of the original LSA space S 3 16 insures that the semantic classification error 
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. a") rate of the semantic classification unit 1 12 does not substantially increase as the new 

/r 

^^j^J words and documents 1 1 0 vary from those contained in the original training corpus 7 
202. 

Please replace the paragraph beginning on page 13 at line 25 with the following 
paragraph: 



fit 



With reference to FIG. 4A, if n additional documents contain words drawn from 
the original underlying vocabulary V 206 plus m words previously unseen (i.e. out-of- 
vocabulary words), then the adaptive LSA paradigm 400 constructs a word-document 
matrix W 402 of dimension (M + m)x (N + n) in the same manner as described for 
generating matrix W202 in the basic LSA paradigm 300 in FIGS. 3A and 3B. Using the 
same order of decomposition R , the SVD of W402 leads to: 



W = USV T , 



(4) 



where U 406 is the left singular matrix of dimension (M + m)x R , S 408 is the diagonal 
matrix of dimension RxR , and V 410 is the right singular matrix of dimension 
{N + n)xR y each having the same definitions and properties as described above for 
W , U , S , and V in FIGS. 3 A and 3B. 

Please replace the paragraph beginning on page 14 at line 9 with the following 
paragraph: 

As shown in FIG. 4A 5 the m new words are gathered in the mx(N + n) matrix 
C = [CE] 422, the n new documents are gathered in the (M + m)xn matrix 
D = [d t E t J 424. U 406 is expressed as [c7 l r c7 2 r f , where C/, r 436 is the transposition 
of the left singular matrix of dimension MxR and U\ 438 is the transposition of the left 



singular matrix of dimension mxR . V T 410 is expressed as [^ r ^ 2 r ] where V X T 439 



is 
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the transposition of the right singular matrix of dimension RxN and F 2 r 440 is the 
transposition of the right singular matrix of dimension Rxn. The new decomposition of 
W expressed in (4) leads to a different LSA space S 416, in which the word and 
document vectors are now given by the scaled row vectors Ui =14^ 418 and v j = VjS 

420 (i.e. the rows of US 412 and VS 414) to characterize the position of word ma and 
document d } . 

Please replace the paragraph beginning on page 14 at line 20 with the following 




FIGS. 7 A and 7B illustrate the prior art approach referred to as baseline 
adaptation 700, where the distinction between the SVD in (1) of the original word- 
document co-occurrence matrix W 302 in FIG. 3 A and the SVD in (4) of the extended 

word-document co-occurrence matrix W 402 in FIG. 4A is ignored by making the 
(obviously invalid) assumption that the original LSA space S 3 16 in FIG. 3B is the same 
as the new LSA space S 416 in FIG. 4B. In other words, in baseline adaptation 700, the 
SVD in (1) is still assumed to be valid even after the new documents become available, 
and the problem is reduced to representing the new data in the original LSA space S 3 16. 

Please replace the paragraph beginning on page 15 at line 1 with the following 
paragraph: 

Referring now to FIGS. 4A and 7B, the baseline adaptation approach 700 treats 

the portions of the matrix W 402 identified as C 430 and D 432 as merely extensions of 
additional rows or columns of the original matrix W 302, and discards altogether the 
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* / 



portion of the extended matrix W 402 identified as £434. This has the effect of 
ignoring significant amounts of new data, including any out-of-vocabulary words in the 
new documents. 

Please replace the paragraph beginning on page 1 5 at line 7 with the following 
paragraph: 



Using the baseline adaptation approach 700, the representation of those portions 
of the new data that will be added to the original LSA space S 3 16 is obtained from the 



SVD of as C 430 and D 432 as follows: 



C = YSV 1 



(5) 



D = USZ T , (6) 
where the m x R matrix Y 426 and the n x R matrix Z 428 are defined a posteriori (as 
plug-ins), to satisfy the relationship. In essence, using the baseline adaptation framework 
700, the role of matrices Y 426 and Z428 is to "extend" the original matrices U 306 
and V 3 10 to accommodate the new data. The original word and document vectors w, 
318 and v> 320 are still given by the rows of US 312 and VS 3 14, but the new word and 
document vectors y. 446 and zj 448 are given by the rows of YS 442 and ZS 444, 

respectively. From (5) and (6), these are seen to be: 

YS = CV 9 (7) 
ZS-Z) r [/. (8) 

The effect, illustrated in FIG. 7B, is that the original LSA space S 316 becomes 
populated with the new data, i.e. the new word and document vectors y g 446 and zj 44$, 
hence the name "folding-in." 
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Please replace the paragraph beginning on page 15 at line 24 with the following 
paragraph: 



A major drawback to the above-described baseline adaptation approach 700 
'3 illustrated in FIG. 7B is poor performance, since even when populated with the new word 

and document vectors y i 446 and z, 448, the misclassification error rate using the 
original LSA space S 316 is still high when the new words and documents vary from the 
original training corpus 7202, e.g. when the new documents contain several new words 

not in the original training corpus. 

Please replace the paragraph beginning on page 16 at line 3 with the following 
paragraph: 



/I- 



In contrast, the latent semantic adaptation approach of the present invention 
achieves significant reductions in the misclassification error rate. Unlike baseline 
adaptation 700, the latent semantic adaptation approach of the present invention 
recognizes that there is an important distinction between the SVD in (1) of the original 
word-document co-occurrence matrix W 302 in FIG. 3 A and the SVD in (4) of the 

extended word-document co-occurrence matrix W 402 in FIG. 4A that must be taken 
into account since the original LSA space S 3 16 in FIG. 3B is not the same as the new 
LSA space S 416 in FIG. 4B. In other words, the SVD in (1) is no longer valid after the 
new documents become available, so the problem is more than just representing the new 
data in the original LSA space S 316. Therefore, in one embodiment, the latent semantic 

adaptation approach treats the portions of the matrix W 402 identified as C 430 and/or 
D 432 in FIG. 4 A as new data that must be accounted for in a new LSA space S 416. In 
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one embodiment, the portion of the matrix W 402 identified as E 434 in FIG. 4A is also 
treated as new data that must be accounted for in a new LSA space S 416. 

Please replace the paragraph beginning on page 16 at line 17 with the following 
paragraph: 



In one embodiment of latent semantic adaptation, the scaled row vectors (i.e. the 

rows of US 412 and VS 414) are obtained directly from the SVD of the entire matrix 

W 402 in (4) using a latent semantic adaptation framework 400 as defined in the 
equations that follow. By inspection from FIG. 4A, 

C = U 2 SV X T , (9) 

D = U X SV?, (10) 

and 

W = U i SV l T , (11) 

E = U 2 SV 2 T , (12) 

each of which are column-orthonormal, i.e., U T U = V T V = I R (the identity matrix of 
order R ). The orthogonality constraints can also be expressed in terms of U X9 U 29 V l9 
and V 2 as follows: 

U T U = I R =U[U^UlU 2 , (13) 
V r V = lt=V*V x +VlV r . (14) 

In one embodiment, the foregoing equations (9)-(14) define the latent semantic adaptation 

framework 400 of the method of the present invention. The latent semantic adaptation 

framework 400 is used to solve for the "extension" SVD matrices U 406, S 408, and 
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0 



/\ I 5 V 410 as a function of the original SVD matrices U 306, S 308, V 3 10, and "extension" 



A 



SVD matrices Y 426, and Z428. 



Please replace the paragraph beginning on page 17 at line 1 1 with the following 
paragraph: 



^ According to one embodiment, the solution is obtained by setting up a latent 

semantic adaptation transformation 500, as illustrated in FIGS. 5A and 5B, based on the 
assumptions previously noted that the dimension R of the original LSA space S 3 16 is 
low enough that none of the corresponding R singular values are zero, and that the 
transformation necessary to adapt the original LSA space S 3 16 is invertible. Starting 

with S 408, the shift from S 308 in FIG. 3 A to S 408 in FIG. 4A can be captured as 
illustrated in FIGS. 5A and 5B by the following expressions: 



U t =UG 9 (15) 

% = VH, (16) 
where G 508 and H 518 are (R x R) matrices that, according to the second assumption, 

are assumed to be invertible. Taken together, (15) and (16) define a latent semantic 

adaptation matrix transformation 500 to apply to the original SVD matrices U 306 and 

V 3 10 to update them according to the new data. 

Please replace the paragraph beginning on page 20 at line 7 with the following 

paragraph: 



n 



From equations (17), (28), and (29), it is clear that: 

(GS)(GS) T = GS 2 G T = SH- T H~*S = S(I R + Z T Z)S , 
(HS)(HS) T = HS 2 H T = SG- T G~ l S = S(I R + Y T Y)S . 



(37) 
(38) 
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Thus, it is also possible to obtain GS and HS directly through Choleski decomposition, 
in a manner analogous to that mentioned above G 508 and H 518. In fact, as illustrated 
in FIGS. 6A and 6B, if J 618 and K 608 are the solutions of relevant Choleski 
decompositions, viz.: 

N JJ T =(I R +Y T Y\ (39) 

KK T = (I R -f Z r Z), (40) 
then equations (35)-(38) admit as solutions: 



US = 



VS = 



(41) 



J. (42) 



Please replace the paragraph beginning on page 20 at line 20 with the following 
paragraph: 

j\ / ^) In other words, in accordance with one embodiment of the present invention, the 

original vectors US 312 and VS 314, as well as the new vectors resulting from the 
"folding-in" process YS 442 and ZS 444, can be transformed using a latent semantic 
adaptation vector transformation 600 defined by the transformation matrices K 608 in 

FIG. 6 A and J 618 in FIG 6B to respectively yield the updated word vectors US 412 
and document vectors VS 414. Therefore, equations (41) and (42) make it possible to 
adapt the original LSA space 5 316 of FIG. 3B to the new LSA space S 416 of FIG. 4B. 



Please replace the paragraph beginning on page 2 1 at line 3 with the following 
paragraph: 
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In one embodiment of the latent semantic adaptation framework 400, the new 
information, as reflected through the transformation matrices K60S and J 618, affects 

both original word and document vectors Ui 318 and v 7 320 and new word and document 

vectors y i 446 and z j 448, referred to as two-sided adaptation. Stated another way, the 

transformed representation of the new word and document vectors y i 446 and z>448 
takes into account its own influence on the underlying semantic knowledge that was 
encapsulated in the original LSA space 5 316 of FIG. 3B (i.e. the existing word and 

document vectors w, 318 and vj 320) to yield the transformed word and document 
vectors w , 418 and v 7 420 that populate the new LSA space 5 416 of FIG. 4B. As 
indicated by the arrows in the new LSA space 5 416 of FIG. 4B, the positions of both the 
words and documents represented by original word and document vectors m 318 and 
vj 320 have shifted from their positions in the original LSA space 5 316 to reflect their 
changed position (i.e. their relationship) within the new LSA space 5 416. The new LSA 

space 5 416 allows not only for improvements in the misclassification error rate, but also 
provides the ability to adapt the speech recognition database that embodies the new LSA 
space 5 416 in real-time, because the application of the transformation matrices A^608 
and J 618 is computationally efficient and bypasses the need to re-compute the LSA 

space. 

Please replace the paragraph beginning on page 22 at line 3 with the following 
paragraph: 
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In addition to providing improved performance through lowering the 
misclassification rate, it is also worth noting that the latent semantic adaptation 
framework 400 and resulting latent semantic adaptation matrix and vector 
transformations 500 and 600 respectively are computationally efficient. Compared to the 
"folding-in" computations of the baseline adaptation approach 700, the latent semantic 
adaptation matrix and vector transformations 500 and 600 of the latent semantic 
adaptation framework 400 entail less overhead. For example, in terms of the number of 
floating point operations required, the overhead associated with the latent semantic 
adaptation vector transformations 600 embodied in equations (39)-(42) can be expressed 
as: 

* «M = \ R " + [W + AO + 2{m + n) - l)]R 2 + (m + n + 1)7?. (43) 

For typical values of the various dimensions involved, expression (43) will be dominated 
by (M + N)R 2 . Depending on the application, this quantity may fall anywhere between 
about 50 million (for voice command and control types of speech recognition 
applications using a limited vocabulary) and more than 1 billion (for large vocabulary 
transcription). Still, on current high-end machines, this quantity only represents up to a 
few seconds of central processor unit (CPU) time. Compared to recomputing the SVD 
from scratch, which requires O (MNR) operations, the computational complexity is 
reduced by a factor of approximately min(M, N)/ R . In many speech recognition 
applications, the reduction factor will be on the order of 1000. In such cases, the latent 
semantic adaptation framework 400 and resulting latent semantic adaptation matrix and 
vector transformations 500 and 600 make it practical to adapt the new LSA space S 416 
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